Amendments to the Claims 

Please amend the claims according to the following directions. Please replace all prior 
versions and listings of claims in this application with the following list of claims: 

1-30 (previously cancelled). 

31. (previously presented) A multiple store buffer forwarding apparatus, comprising: 

a processor having a write combining buffer, and 

a non- volatile memory coupled to the processor, said non- volatile memory storing 
instructions which when executed by the processor cause the processor to: 

execute a plurality of store instructions referencing a first memory region; 

execute a load instruction referencing a second memory region; 

determine that the second memory region matches a cacheline address; 

determine that the first memory region completely covers the second memory region; and 

transmit a store forward is OK signal. 

32. (previously presented) The multiple store buffer forwarding apparatus of claim 31, wherein 
the write combining buffer includes: 

a comparator to receive and compare an address of the second memory region with all 
existing cacheline addresses in the write combining buffer, 
an address and data buffer coupled to the comparator, 
a data valid bits buffer coupled to the address and data buffer, 
a multiplexer coupled to the data valid bits buffer, and 
a comparison circuit coupled to the multiplexer. 

33. (previously presented) The multiple store buffer forwarding apparatus of claim 32, wherein 
the multiplexer is to: 

receive a byte valid vector from the data valid bits buffer, 
receive address bits from the load instruction, 
select a group of valid bits from the byte valid vector, and 
output the group of valid bits. 
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34. (previously presented) The multiple store buffer forwarding apparatus of claim 33, wherein 
the comparison circuit is to: 

receive the group of valid bits; 

receive an incoming load instruction byte mask; 

determine that it is acceptable to forward the data using the group of valid bits and the 
incoming load instruction byte mask; and 
produce a forward OK signal. 

35. (previously presented) The multiple store buffer forwarding apparatus of claim 3 1 , wherein 
said processor is implemented as a multi-processor having associated with each said multi- 
processor a separate set of hardware resources. 

36. (previously presented) A multiple store buffer forwarding apparatus, comprising: 

a memory; 

a processor coupled to said memory and having a write combining buffer, said processor 

to 

execute a plurality of store instructions referencing a first memory region of said memory; 
execute a load instruction referencing a second memory region of said memory; 
determine that the second memory region matches a cacheline address; 
determine that the first memory region completely covers the second memory region; and 
transmit a signal indicating that store buffer forwarding is authorized. 

37. (previously presented) The multiple store buffer forwarding apparatus of claim 36, wherein 
the write combining buffer includes: 

a comparator to receive and compare an address of the second memory region with all 
existing cacheline addresses in the write combining buffer, 
an address and data buffer coupled to the comparator, 
a data valid bits buffer coupled to the address and data buffer, 
a multiplexer coupled to the data valid bits buffer, and 
a comparison circuit coupled to the multiplexer. 
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38. (previously presented) The multiple store buffer forwarding apparatus of claim 36, wherein 
the multiplexer is to: 

receive a byte valid vector from the data valid bits buffer, 
receive address bits from the load instruction, 
select a group of valid bits from the byte valid vector, and 
output the group of valid bits. 

39. (previously presented) The multiple store buffer forwarding apparatus of claim 38, wherein 
the comparison circuit is to: 

receive the group of valid bits; 

receive an incoming load instruction byte mask; 

determine that it is acceptable to forward the data using the group of valid bits and the 
incoming load instruction byte mask; and 

produce a signal indicating that it is acceptable to forward the data. 

40. (previously presented) The multiple store buffer forwarding apparatus of claim 36, wherein 
said processor is implemented as multiple processors wherein a separate set of hardware 
resources is associated with each of said multiple processors. 

41. (new) An apparatus, comprising: 

a processor; 

a write combining buffer (WCB) coupled to the processor, the WCB to combine store 
data from a plurality of processor store operations into a single WCB entry; 
a comparison circuit coupled to the WCB, 

the comparison circuit to receive a load operation from the processor, 

the comparison circuit to compare a memory region requested by the load operation to 

addresses of the store data in the WCB, and 

the comparison circuit to generate a signal indicating that store buffer forwarding is 

authorized for the load instruction if the memory region requested by the load operation can be 

globally observed in a single atomic transaction and if the store data in the WCB completely 

covers the memory region requested by the load operation. 
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42. (new) The apparatus of claim 41, wherein each WCB entry is sized to match a system cache 
line size. 
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